Compositional data and Simpson’s paradox
نویسندگان
چکیده
Simpson’s paradox, also known as amalgamation or aggregation paradox, appears when dealing with proportions. Proportions are by construction parts of a whole, which can be interpreted as compositions assuming they only carry relative information. The Aitchison inner product space structure of the simplex, the sample space of compositions, explains the appearance of the paradox, given that amalgamation is a non-linear operation within that structure. Here we propose to use balances, which are specific elements of this structure, to analyse situations where the paradox might appear. With the proposed approach we obtain that the centre of the tables analysed is a natural way to compare them, which avoids by construction the possibility of a paradox.
منابع مشابه
Computational Social Scientist Beware: Simpson's Paradox in Behavioral Data
Observational data about human behavior is often heterogeneous, i.e., generated by subgroups within the population under study that vary in size and behavior. Heterogeneity predisposes analysis to Simpson’s paradox, whereby the trends observed in data that has been aggregated over the entire population may be substantially different from those of the underlying subgroups. I illustrate Simpson’s...
متن کاملSimpson’s Paradox in the interpretation of “leaky pipeline” data
The traditional ‘leaky pipeline’ plots are widely used to inform gender equality policy and practice. Herein, we demonstrate how a statistical phenomenon known as Simpson’s paradox can obscure trends in gender ‘leaky pipeline’ plots. Our approach has been to use Excel spreadsheets to generate hypothetical ‘leaky pipeline’ plots of gender inequality within an organisation. The principal factors,...
متن کاملIntegrating Bayesian Networks and Simpson’s Paradox in Data Mining
This paper proposes to integrate two very different kinds of methods for data mining, namely the construction of Bayesian networks from data and the detection of occurrences of Simpson’s paradox. The former aims at discovering potentially causal knowledge in the data, whilst the latter aims at detecting surprising patterns in the data. By integrating these two kinds of methods we can hopefully ...
متن کاملSimpson’s Paradox – A Survey of Past, Present and Future Research
Simpson’s paradox refers to the reversal of a statistical relationship between two variables in sub-populations when the sub-populations are combined and analyzed as a population. This article is intended to provide a broad survey of the past, present and future research surrounding the issue. Real data from a discrimination litigation case is examined to identify the occurrence of the paradox....
متن کاملHow Likely is Simpson’s Paradox?
What proportion of all 2× 2× 2 contingency tables exhibit Simpson’s Paradox? An approximate answer is obtained for large sample sizes and extended to 2×2×l tables. Several conditional probabilities of the occurrence of Simpson’s Paradox are also derived. Given that the observed cell frequencies satisfy a Simpson reversal, the posterior probability that the population parameters satisfy the same...
متن کامل